Grammar Compressed Sequences

نویسندگان

  • Alberto Ordóñez
  • Gonzalo Navarro
  • Nieves R. Brisaboa
چکیده

Sequence representations supporting not only direct access to their symbols, but also rank/select operations, are a fundamental building block in many compressed data structures. Several recent applications need to represent highly repetitive sequences, and classical statistical compression proves ineffective. We introduce, instead, grammar-based representations for repetitive sequences, which use up to 6% of the space needed by statistically compressed representations, and support direct access and rank/select operations within tens of microseconds. We demonstrate the impact of our structures in text indexing applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grammar Compressed Sequences with Rank/Select Support

Sequence representations supporting not only direct access to their symbols, but also rank/select operations, are a fundamental building block in many compressed data structures. In several recent applications, the need to represent highly repetitive sequences arises, where statistical compression is ineffective. We introduce grammar-based representations for repetitive sequences, which use up ...

متن کامل

Linear Compressed Pattern Matching for Polynomial Rewriting (Extended Abstract)

This paper is an extended abstract of an analysis of term rewriting where the terms in the rewrite rules as well as the term to be rewritten are compressed by a singleton tree grammar (STG). This form of compression is more general than node sharing or representing terms as dags since also partial trees (contexts) can be shared in the compression. In the first part efficient but complex algorit...

متن کامل

Algorithmics on SLP-compressed strings: A survey

Results on algorithmic problems on strings that are given in a compressed form via straightline programs are surveyed. A straight-line program is a context-free grammar that generates exactly one string. In this way, exponential compression rates can be achieved. Among others, we study pattern matching for compressed strings, membership problems for compressed strings in various kinds of formal...

متن کامل

Compact Representations of Event Sequences

We introduce a new technique for the efficient management of large sequences of multidimensional data, which takes advantage of regularities that arise in real-world datasets and supports different types of aggregation queries. More importantly, our representation is flexible in the sense that the relevant dimensions and queries may be used to guide the construction process, easily providing a ...

متن کامل

Optimal Time Random Access to Grammar-Compressed Strings in Small Space

The random access problem for compressed strings is to build a data structure that efficiently supports accessing the character in position i of a string given in compressed form. Given a grammar of size n compressing a string of size N , we present a data structure using O(n∆ log∆ N n logN) bits of space that supports accessing position i in O(log∆ N) time for ∆ ≤ log O(1) N . The query time i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016